Supervised Deep Features for Software Functional Clone Detection by Exploiting Lexical and Syntactical Information in Source Code

نویسندگان

  • Huihui Wei
  • Ming Li
چکیده

Software clone detection, aiming at identifying out code fragments with similar functionalities, has played an important role in software maintenance and evolution. Many clone detection approaches have been proposed. However, most of them represent source codes with hand-crafted features using lexical or syntactical information, or unsupervised deep features, which makes it difficult to detect the functional clone pairs, i.e., pieces of codes with similar functionality but differing in both syntactical and lexical level. In this paper, we address the software functional clone detection problem by learning supervised deep features. We formulate the clone detection as a supervised learning to hash problem and propose an end-to-end deep feature learning framework called CDLH for functional clone detection. Such framework learns hash codes by exploiting the lexical and syntactical information for fast computation of functional similarity between code fragments. Experiments on software clone detection benchmarks indicate that the CDLH approach is effective and outperforms the state-of-the-art approaches in software functional clone detection.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using Machine Learning and Information Retrieval Techniques to Improve Software Maintainability

The software architecture plays a fundamental role in the comprehension and maintenance of large and complex systems. However, unlike classes or packages, this information is not explicitly represented in the code, giving rise to the definition of different approaches to automatically recover the original architecture of a system. Software architecture recovery (SAR) techniques aim at extractin...

متن کامل

Method-level code clone detection for java through hybrid approach

A Software clone is an active research area where several researchers have investigated techniques to automatically detect duplicated code in programs. However their researches have limitations either in finding the structural or functional clones. Moreover, all these techniques detected only the first three types of clones. In this paper, we propose a hybrid approach combining metric-based app...

متن کامل

Deep Learning Similarities from Different Representations of Source Code

Assessing the similarity between code components plays a pivotal role in a number of Software Engineering (SE) tasks, such as clone detection, impact analysis, refactoring, etc. Code similarity is generally measured by relying on manually defined or hand-crafted features, e.g., by analyzing the overlap among identifiers or comparing the Abstract Syntax Trees of two code components. These featur...

متن کامل

Enhancing the Unified Features to Locate Buggy Files by Exploiting the Sequential Nature of Source Code

Bug reports provide an effective way for end-users to disclose potential bugs hidden in a software system, while automatically locating the potential buggy source files according to a bug report remains a great challenge in software maintenance. Many previous approaches represent bug reports and source code from lexical and structural information correlated their relevance by measuring their si...

متن کامل

Code-Copying in the Balochi Language of Sistan

This empirical study deals with language contact phenomena in Sistan. Code-copying is viewed as a strategy of linguistic behavior when a dominated language acquires new elements in lexicon, phonology, morphology, syntax, pragmatic organization, etc., which can be interpreted as copies of a dominating language. In this framework Persian is regarded as the model code which provides elements for b...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017